Finding information in books: characteristics of full-text searches in a collection of 10 million books
نویسندگان
چکیده
Searching large collections of digitized books is a relatively new area in information-seeking and retrieval research, made possible by initiatives such as Google Books and the HathiTrust Digital Library. The availability of large full-text book collections is transforming how users search and interact with information in books, but the characteristics of these changes are unknown. This paper aims to provide insight into the characteristics of full-text searches in a large collection of digitized books and is the first step in a broader research agenda intended to improve book retrieval. To better understand the types of queries that users are issuing to full-text-book collections, we analyzed a full year of anonymized query logs from the HathiTrust Digital Library full-text search engine. We also manually classified a random sample of 600 queries to develop a taxonomy of book search query types. We found that users are beginning to search for information in books instead of searching for books. Searches still largely follow bibliographic models, but, as expected, new types of searches are beginning to take advantage of fulltext capabilities. Additionally, comparing the results of our query log analysis to searches in other domains, we found similar search patterns including short queries, sessions with only a few queries, and users viewing only a few pages of results per query. We discuss how these findings can be used to characterize users of large full-text book collections. Author
منابع مشابه
A Comparison of Relationship between Text and Picture in the Selected Iranian and Contemporary American-European Illustrated-Fiction Books Based on the Theory of Maria Nikolajeva and Carole Scott
Illustrated-fiction books are special forms of art that are the combination of text and picture. The relationship between text and picture in this genre is diverse and variegated, and has different effects on the audience; however, little research has been done about it. The goal of this research is to compare text/picture relationship in the selected Iranian and contemporary American-European ...
متن کاملEurope PMC: Quick tour
What is Europe PMC? Europe PMC [2] is a global, free, biomedical literature repository, providing access to worldwide life sciences articles, books, patents and clinical guidelines. The resource currently contains over 32 million abstracts and more than 4 million full-text articles (see Figure 1). A subset of the full-text information corpus is the open-access literature that can be downloaded ...
متن کاملیک دهه نشر کتاب در حوزه علم اطلاعات و دانششناسی ایران ( 1381-1390)
Abstract Introduction: Identification the status of book publishing in Knowledge and Information Science during 2002-2011 in Iran. Methods: This study is a practical research according to its aim. Research methodology is analytical survey. Data were collected through a checklist. The population consist of 632 books in field of Knowledge and Information Science published during 2002-2011. The ...
متن کاملCharacteristics of Arabic Identity in Intellectual System of Hisham Kalbi based on his Books on Genealogy
Science of "Genealogy" was one of the branches of History and Historiography during the age of Jāhilīyah (age of ignorance) which has grown rapidly in the Islamic era. In this context, Hisham Kalbi (d. 204 AH. / 819 AD.), as the first author and editor of Genealogy, has a great contribution to the formation and prosperity of this science, with two important texts, the Jamharat Al-Ansab and Nasa...
متن کاملSurveying the Experts View on the Necessity of Revision in Rating of Children and Adolescents's Books
Background and Aim: The purpose of this study was to find out the current status of non-academic rankings of children's books and survey the experts view on the revision scheme in the classification of such books. Method: The qualitative study was employed. The research tool was a questionnaire based on the research objectives. Openended interview data collection method was used based...
متن کامل